[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

harrisonGPU · 2025-11-03T08:09:53Z

This patch adds support for the pattern:

  %index = select i1 %idx_sel, i32 0, i32 4
  %elt = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %index

by scaling the byte offset to an element index (index >> log2(ElemSize)),
allowing the vector element to be updated with insertelement instead of using
scratch memory.

llvmbot · 2025-11-03T08:10:25Z

@llvm/pr-subscribers-backend-amdgpu

Author: Harrison Hao (harrisonGPU)

Changes

This patch adds support for the pattern:

  %elt = getelementptr inbounds i8, ptr addrspace(5) %alloca, i32 %index

by scaling the byte offset to an element index (index >> log2(ElemSize)),
allowing the vector element to be updated with insertelement instead of using
scratch memory.

Full diff: https://github.com/llvm/llvm-project/pull/166132.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+13-2)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll (+20)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index ddabd25894414..793c0237cdf38 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -456,10 +456,21 @@ static Value *GEPToVectorIndex(GetElementPtrInst *GEP, AllocaInst *Alloca,
   const auto &VarOffset = VarOffsets.front();
   APInt OffsetQuot;
   APInt::sdivrem(VarOffset.second, VecElemSize, OffsetQuot, Rem);
-  if (Rem != 0 || OffsetQuot.isZero())
-    return nullptr;
+
+  Value *Scaled = nullptr;
+  if (Rem != 0 || OffsetQuot.isZero()) {
+    unsigned ElemSizeShift = Log2_64(VecElemSize);
+    Scaled = Builder.CreateLShr(VarOffset.first, ElemSizeShift);
+    if (Instruction *NewInst = dyn_cast<Instruction>(Scaled))
+      NewInsts.push_back(NewInst);
+    OffsetQuot = APInt(BW, 1);
+    Rem = 0;
+  }
 
   Value *Offset = VarOffset.first;
+  if (Scaled)
+    Offset = Scaled;
+
   auto *OffsetType = dyn_cast<IntegerType>(Offset->getType());
   if (!OffsetType)
     return nullptr;
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
index 76e1868b3c4b9..65bddaba8dd14 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll
@@ -250,6 +250,26 @@ bb2:
   store i32 0, ptr addrspace(5) %extractelement
   ret void
 }
+
+define amdgpu_kernel void @scalar_alloca_vector_gep_i8(ptr %buffer, float %data, i32 %index) {
+; CHECK-LABEL: define amdgpu_kernel void @scalar_alloca_vector_gep_i8(
+; CHECK-SAME: ptr [[BUFFER:%.*]], float [[DATA:%.*]], i32 [[INDEX:%.*]]) {
+; CHECK-NEXT:    [[ALLOCA:%.*]] = freeze <3 x float> poison
+; CHECK-NEXT:    [[VEC:%.*]] = load <3 x float>, ptr [[BUFFER]], align 16
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <3 x float> [[VEC]], float [[DATA]], i32 [[TMP1]]
+; CHECK-NEXT:    store <3 x float> [[TMP2]], ptr [[BUFFER]], align 16
+; CHECK-NEXT:    ret void
+;
+  %alloca = alloca <3 x float>, align 16, addrspace(5)
+  %vec = load <3 x float>, ptr %buffer
+  store <3 x float> %vec, ptr addrspace(5) %alloca
+  %elt = getelementptr inbounds nuw i8, ptr addrspace(5) %alloca, i32 %index
+  store float %data, ptr addrspace(5) %elt, align 4
+  %updated = load <3 x float>, ptr addrspace(5) %alloca, align 16
+  store <3 x float> %updated, ptr %buffer, align 16
+  ret void
+}
 ;.
 ; CHECK: [[META0]] = !{}
 ; CHECK: [[RNG1]] = !{i32 0, i32 1025}

shiltian

What if the index here is somewhere in the middle? It doesn't look like we have any constraint on the index.

arsenm · 2025-11-04T04:25:51Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+
+  Value *Scaled = nullptr;
+  if (Rem != 0 || OffsetQuot.isZero()) {
+    unsigned ElemSizeShift = Log2_64(VecElemSize);


Need to validate VecElemSize is a power of 2?

Thanks, but I think it is not necessary to explicitly check whether the element size is a power of two, because it is already covered by the existing check here:

llvm-project/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Lines 871 to 877 in 52fdcf9

Type *VecEltTy = VectorTy->getElementType();

unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);

if (ElementSizeInBits != DL->getTypeAllocSizeInBits(VecEltTy)) {

LLVM_DEBUG(dbgs() << " Cannot convert to vector if the allocation size "

"does not match the type's size\n");

return false;

}

If the element type is not naturally aligned, it will return false, which also rejects non power of 2 element sizes, such as i24.

arsenm · 2025-11-04T04:26:03Z

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep.ll

+  %updated = load <3 x float>, ptr addrspace(5) %alloca, align 16
+  store <3 x float> %updated, ptr %buffer, align 16
+  ret void
+}


Test with non-power-of-2 element size

harrisonGPU · 2025-11-04T09:40:36Z

What if the index here is somewhere in the middle? It doesn't look like we have any constraint on the index.

Could you please give me an example?

shiltian · 2025-11-04T16:00:24Z

Could you please give me an example?

In the test case, the %index is for %alloca of <3 x float>, which is 12-byte, each of which has 4-byte. Since the GEP is of type i8, what if %index is 5, which will be somewhere in the middle of the 2nd element of %alloca.

…element

harrisonGPU · 2025-11-10T08:28:04Z

Could you please give me an example?

In the test case, the %index is for %alloca of <3 x float>, which is 12-byte, each of which has 4-byte. Since the GEP is of type i8, what if %index is 5, which will be somewhere in the middle of the 2nd element of %alloca.

Hi @shiltian , I’ve already thought about this issue, thank you very much for your suggestion and for pointing it out.
Now I think we should only promote when the variable index is guaranteed to be aligned to the element size.
We can use computeKnownBits and countMinTrailingZeros to check that the lower bits of the index are zero, which verifies its alignment before promoting.
I’ve updated the lit test and commit message accordingly. What do you think?

shiltian · 2025-11-18T18:22:51Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

-    return nullptr;
-
  Value *Offset = VarOffset.first;
+  if (Rem != 0 || OffsetQuot.isZero()) {


Does it have to be this complicated? I thought checking whether offset % size would be sufficient?

Thanks, I agree your points, I have removed it.

github-actions · 2025-11-20T03:31:12Z

🐧 Linux x64 Test Results

186411 tests passed
4868 tests skipped

harrisonGPU requested review from arsenm, jayfoad, perlfu, ritter-x2a, ruiling and shiltian November 3, 2025 08:09

harrisonGPU self-assigned this Nov 3, 2025

llvmbot added the backend:AMDGPU label Nov 3, 2025

shiltian reviewed Nov 3, 2025

View reviewed changes

arsenm reviewed Nov 4, 2025

View reviewed changes

harrisonGPU added 2 commits November 6, 2025 14:34

[AMDGPU] Enable i8 GEP promotion for vector allocas

a69372f

[AMDGPU] Use computeKnownBits to check if it points the middle of an …

6a28740

…element

harrisonGPU force-pushed the amdgpu/promote-vector branch from c1439a3 to 6a28740 Compare November 10, 2025 08:22

shiltian reviewed Nov 18, 2025

View reviewed changes

Remove unnecessary check.

19584ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

harrisonGPU commented Nov 3, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

shiltian left a comment

Uh oh!

arsenm Nov 4, 2025

Uh oh!

harrisonGPU Nov 4, 2025

Uh oh!

arsenm Nov 4, 2025

Uh oh!

harrisonGPU commented Nov 4, 2025

Uh oh!

shiltian commented Nov 4, 2025

Uh oh!

harrisonGPU commented Nov 10, 2025 •

edited

Loading

Uh oh!

shiltian Nov 18, 2025

Uh oh!

harrisonGPU Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	Type *VecEltTy = VectorTy->getElementType();
	unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
	if (ElementSizeInBits != DL->getTypeAllocSizeInBits(VecEltTy)) {
	LLVM_DEBUG(dbgs() << " Cannot convert to vector if the allocation size "
	"does not match the type's size\n");
	return false;
	}

[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

Are you sure you want to change the base?

[AMDGPU] Enable i8 GEP promotion for vector allocas #166132

Conversation

harrisonGPU commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

harrisonGPU Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

harrisonGPU commented Nov 4, 2025

Uh oh!

shiltian commented Nov 4, 2025

Uh oh!

harrisonGPU commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

harrisonGPU Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 20, 2025

🐧 Linux x64 Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

harrisonGPU commented Nov 3, 2025 •

edited

Loading

harrisonGPU commented Nov 10, 2025 •

edited

Loading